Entropies of Mixing and the Lorenz Order 
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Entropies of mixing can be derived directly from the parent distributions of extreme value theory. 
They correspond to pseudo-additive entropies in the case of Pareto and power function distributions, 
while to the Shannon entropy in the case of the exponential distribution. The former tend to the 
latter when their shape parameters tend to infinity and zero, respectively. Hence processes whose 
entropies of mixing are pseudo-additive entropies majorize, in the Lorenz order sense, those whose 
entropy is the Shannon entropy. In the case of the arcsine distribution, maximal properties of regular 
polygons correspond to maximum entropy of mixing. 
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MAJORIZATION AND THE LORENZ ORDER 

The employment of entropies as measures of inequal- 
ity dates back a long way Majorization, as well as 
Lorenz ordering, consists of a partial order of n-tuples 
that lie on the postive orthant of n-dimensional Eu- 
clidean space, K™ |2J- Comparison of different popula- 
tions can be made with respect to their diversity. Arrang- 
ing two sets of n-tuples, x\, x%, ■ ■ ■ , x n and y\, 1/2, • • • , Dn 
in increasing order, x\-. n < X2-. n < ■ ■ ■ < x n - n and 
Vi-.n < U2:n < • • • < Dn:m a vector y is said to majorize 
another vector x, y >- x, if £)i=i yi-.n < Yli=x x >--n f° r 
k = 1, 2, . . . , n - 1, and 5Z™ =1 = Z)2=i Weak 
majorization converts the last inequality into an inequal- 
ity like the former one [lj, pp. 9ff]. Expressed in words, 
this says that x does not exhibit more inequality than y. 

The characteristic Lorenz bow-shaped curves, 
indicating that there are many more poor peo- 
ple than rich ones, are obtained by plotting 

the points (k/n,J2i=i x i:n/ x i-n) aim 

(k/n,J2i=i yi--n/J2"=i Vi-nj, and using linear in- 
terpolation. If the Lorenz curve of x is obtained by 
raising the y curve at one or several points, then y >~ x. 
Lorenz curves are convex, and prosaically speaking "as 
the bow is bent so inequality (or the concentration of 
wealth) increases". 

Majorization involves a partial ordering of vectors in 
K" . If n is so large as to make partial ordering impracti- 
cal, sufficient conditions for majorization can be obtained 
by indentifying a Schur-convex function. Oddly enough, 
Schur's theory of convex functions preceded the theory of 
majorization by about a decade. A function / is said to 
be Schur-convex when y >~ x f(y) > f(x). In particu- 
lar contexts, like thermodynamics and ecology, measures 
of inequality should be maximum when all the compo- 
nents are equal; here enters the notion of entropy. In so 
doing, we transfer our attention from a Schur-convex to 
a Schur-concave function. Until now there has been no 
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unique relation between the entropy employed and the 
underlying probability distribution. The maximum en- 
tropy formalism chooses the form of the entropy and de- 
rives an exponential family of distributions by invoking 
particular constraints through a variational procedure. 
Here, we shall show that the probability distribution de- 
termines the form of the entropy uniquely via the Lorenz 
function. 

Lorenz order has meaning for continuous, as well as dis- 
crete, probability distributions, but only those that have 
support on non-negative reals and have finite means. In 
fact, the Lorenz function is defined in terms of the nor- 
malized incomplete mean [cf. (01 below]. For certain 
distributions, like the Pareto one, the condition that the 
mean be finite imposes certain restrictions on the shape 
parameter that will appear in the characteristic expo- 
nents in the corresponding entropies of mixing. Although 
it has been lamented that the list of Lorenz functions that 
can be obtained in closed form is remarkably short 0, P- 
34], we will show that they contain all the parent dis- 
tributions of extreme value theory as well as the arcsine 
law. Since there are only three families of exteme value 
distributions, this will place an upper limit to the num- 
ber of Lorenz curves that can be obtained in closed form, 
apart from those obtained through geometrical inequali- 
ties, like the arcsine distribution. 

In fact, the appearance of extreme value theory was 
not unexpected since it enters naturally into order statis- 
tics. Although ordering destroys the independence and 
common distribution of the initial random variables, it 
does lead to some very simple results for extreme values 
when the sample size is allowed to increase without limit. 
That is, if we consider the kth order statistic in a sam- 
ple of size n and let both k and n tend to infinity such 
that k/n = const., the asymptotic distributions of the 
quantiles are obtained 0. Rather, if k is fixed and the 
population size is allowed to increase without limit, the 
asymptotic distributions of extreme values result. The 
parent distributions which lie in the domain of attrac- 
tion of the extreme value distributions for largest value 
are the exponential, Pareto, and power tail distributions. 
These distributions are attracted to the double exponen- 
tial, or what is commonly referred to as the Gumbcl, 
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distribution, the Frechet distribution, and the reversed 
Weibull distribution, respectively. We will, instead, deal 
with the power function distribution which generates the 
Weibull distribution for smallest value since it has the 
same entropy of mixing and is simpler to handle. 

Although these families of extreme value distributions 
appear as distinct, we will show that their entropies of 
mixing, which are derived from the parent distributions, 
all tend to a single entropy of mixing as the shape param- 
eter approaches a certain limit. On the same Lorenz plot, 
we shall see that the entropies of mixing are cigar-shaped 
curves, and the cigar-shaped curve of the entropy of mix- 
ing of the exponential distribution is nested in those of 
the Pareto and power function entropies of mixing. That 
is to say that the entropy of mixing of the exponential 
distribution will be smaller than those of the Pareto and 
power function distributions. The entropies of mixing 
can thus be used in the process of Lorenz ordering and 
are Schur-concave functions that are determined uniquely 
by their Lorenz functions. 

The probability distributions determine the Lorenz 
function up to a scale transformation 0, p. 32], and the 
difference between the tail of the Lorenz function and the 
Lorenz function itself determines the entropy of mixing. 
This is analogous to the difference between the upper and 
lower quartiles which is used as a measure of dispersion 
of the distribution [4| . The difference in the Lorenz func- 
tions is a measure of uncertainty, and uncertainty will be 
greatest when the probabilities are equal. This results in 
the maximum entropy of mixing. 

As a measure of inequality, the entropies of mixing are 
comparable to the Gini index, which is defined as twice 
the area between the Lorenz curve and the 45° line, or the 
unbent bow, representing an equal distribution of income 
for all individuals in the population. Whereas the Gini 
index is a global criterion of inequality, the entropies of 
mixing are local. 

The entropies of mixing obtained from the parent dis- 
tributions are not unfamiliar in the information sciences. 
Those that are obtained from the Pareto and power func- 
tion distributions will be recognized as pseudo-additive 
entropies (pae) || , which have been popularized by Tsal- 
lis in the physical sciences 6] . In well-defined limits of the 
shape parameter of the distributions, these pae transform 
into the Shannon entropy corresponding to the exponen- 
tial distribution. 

Another expression for the Lorenz function that can 
be obtained in closed form corresponds to the arcsine 
law. The maximum entropy of mixing of this distribution 
will be shown to coincide with the maximal properties of 
regular polygons. In this way the maximum uncertainty 
expressed in terms of entropy of mixing is related to the 
geometrical conditions for greatest area and perimeter of 
regular polygons. 



LORENZ FUNCTION AND ENTROPY OF 
MIXING 

The Lorenz curve plots the percentage of total income 
by various fractions of the population ordered in increas- 
ing size of their incomes. Although Lorenz functions have 
been known in parametric form for quite some time, it 
is only relatively recently that a definition in terms of a 
single equation, rather than in terms of two equations, 
has been advanced Q. 

The conventional definition of the Lorenz curve is the 
following 0, Sec. 2.25]. Given the cumulative distribu- 
tion on as 

P = F(x)= f f(t)dt (1) 
Jo 

one solves for x and writes the Lorenz function as the 
normalized, incomplete mean 

L(p) = - [ tf(t)dt, 
A* Jo 

where /x = J °° t dF(t) is the finite mean, and x and p are 
related by (jfj). 

An alternative formulation |7| uses the fact that the 
solution to Q is 

F^ 1 ^) = sup {x : F(x) < t} , 0<t<l. 

X 

In terms of the inverse distribution function, the mean is 
given by the Riemann integral 

H= [ F- l (t)dt. (2) 
Jo 

And in terms of F , the Lorenz function is defined as 
7] 

L(p) = - f P F-\t)dt. (3) 
M Jo 

The Lorenz function, J3Jl, is non-decreasing and convex 
on [0, 1], with L(0) = and L(l) = 1. Most importantly, 
the Lorenz function determines the distribution function, 
F, up to a scale transformation 

L'{p)=F-\p)/^ (4) 

since F" 1 determines F. Moreover, if L is twice differ- 
entiable then F will have a finite positive density given 
by 

f(x) = [»L" (F(x))}- 1 . 

In fact, von Mises's (sufficient) conditions for a distri- 
bution to be in the domain of attraction of an extreme 
value distribution can be very succinctly expressed in 
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terms of the ratios of the derivatives of the Lorenz func- 
tion, without having to take the limit where the variate 
tends to its right end point, sup^, {x : F(x) < 1}. Re- 
distributions unlimited on the right, like the Pareto law, 
F -1 (l) = oo, and 

a(l -p) = L'(p)/L"(p). 

For those that are limited on the right, < oo, and 

L'(l)-L'(p) 



0(1 -p) = 



L"(p) 



Finally, for the exponential distribution the condition is 
l-p = l/L"(p). 
The tail of the Lorenz function can likewise be defined 



as 



L{p) :=l-L{\-p) 



1 



Ji- 



F-\t)dt. (5) 



Since L(j>) is convex, L(p) is concave. The difference 
between © and © gives the entropy of mixing as 



S(p) 



L(p)-L(p) l-L(l-p)-L(p) 



1 - 2 



-D 



1 - 2" 



(6) 



after having been suitably normalized, where the expo- 
nent D in the normalizing denominator is given in terms 
of the shape parameter of the distribution, and it can be 
both positive or negative. 



ENTROPIES OF MIXING OF PARENT 
DISTRIBUTIONS 

The classic example of a Lorenz function is obtained 
from the untranslated Pareto distribution 



(7) 



describing the distribution of salaries above xq. In order 
for there to be a finite mean, a > 1. Let x = F~ 1 (p) so 
that 

F- 1 (p)^x Q (l-p)- 1 / a . 

Since the derivative of the Lorenz function is related to 
the inverse distribution function according to l@J we ob- 
tain 



L(p) = l-{l-p)^-^' c 



(8) 



upon integrating, evaluating the limits as given in 
and using the expression /i = -^jx f° r the mean. 

The tail of the Lorenz curve, defined in JSJ, will be 
given by the expression 



Whereas (|5J) represents the fraction of total income pos- 
sessed by the lowest p-th fraction of the population, © 
represents the fraction of total income held by the highest 
p-th fraction. 

The difference between the Lorenz functions, © and 
©, yield an entropy of mixing 



S-(p) 



p(a-l)/a + (l — p)( Q -!)/ c 

2 1 /" _ l 



(10) 



This is precisely a normalized pae of information theory 
5j. The entropy of mixing (|10|l reflects the symmetry 
between the top and bottom order statistics. The differ- 
ence in Lorenz functions is a measure of uncertainty, just 
like the difference in quartiles is a measure of dispersion. 
The maximum uncertainty occurs when p = \ for which 
5_(|) = 1. 

As a second example, consider the exponential distri- 
bution 



F(x) 



= 1 - e~ x/ ^. 



(11) 



The Lorenz functions are L{p) = (1—p) log(l— p)+p and 
L(p) = p — plogp so that the corresponding normalized 
entropy of mixing is 



Sa(p) 



-plogp - (1 -p) log(I -p) 
log 2 



(12) 



which will be appreciated as the (normalized) Shannon 
entropy of information theory. The normalization has 
been chosen such that the entropy is maximal for p = | , 
viz, 5(|) = 1 i p. 34]. 

The entropy of mixing of the Pareto law, HI()|I , trans- 
forms into the entropy of mixing of the exponential dis- 
tribution, HI2|) . in the limit as a ] oo. Since the expo- 
nential law, Hll|) , is the parent distribution of the double 
exponential distribution of largest value, this distribu- 
tion can be seen to play a limiting role in regard to the 
two other extreme value distributions, the Frechet and 
reversed Weibull distributions. 

In fact, the entropy of mixing associated with the 
power function distribution, 



(13) 



F(x) = - 



which is limited on the right by xq = sup x {x : F(x) < 1}, 
will also tend to the Shannon entropy, (|1 2J1 . in the limit 
where the positive shape parameter (3 J, 0. The Lorenz 
functions are L(p) — p(' 3 + 1 )// 3 and L(p) = 1 — (1 — 
p)(/3+i)//3 so that the entropy of mixing is 1 



S+(P) = 



l-(l-p) 



(/3+l)//3 _p(/3+l)//3 



1 - 2-V/s 



(14) 



L(p) = p 



(9) 



1 The same entropy of mixing would have been obtained had we 
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where we have set D = 1//3 in ©. In the limit j3 J. 0, 
the pae l|14|) transforms into the Shannon entropy, (|12|l . 

Therefore, as the shape parameter of the Pareto distri- 
bution J7J increases, or the shape parameter of the power 
function distribution ifHfl) decreases, there is a decrease 
in inequality as measured by Lorenz ordering. The crite- 
rion of majorization, or Lorenz ordering, is given by the 
inequality of the entropies of mixing 



S±(p) > S (p). 



(15) 



The inequality says that the Lorenz curves of the expo- 
nential distribution are nested in the Lorenz curves of the 
Pareto and power function distributions. In other words, 
any process whose distribution is exponential does not 
exhibit any more inequality in the Lorenz sense than a 
process governed by either the Pareto or power function 
distributions. 

The entropies of mixing, (|10|l and l|14|) . are separable, 
Schur- concave functions. Considering l|10|) . it can be 
written as the sum 



1. In the limit as a — » 1, (|16fl becomes the arithmetic- 
geometric mean inequality. 

In particular, if we set <& = 3, the entropy of mixing of 
the Pareto distribution is seen to be bounded from above, 



S-(P) 



2 

£ 

i=l 



a(p i )<2a[^ Pi ) = 2<r(±) = l. 



i=l 



This shows that S- (|) = 1 is indeed maximal. 

From the above discussion if follows that if y l ^~ x, 
i.e. that x does not show more inequality than y, the 
entropy of mixing of y is greater than that of x. If x and 
y have two Pareto distributions with shape parameters 
a x > a y , then S iv \p) > S ( *\p). The entropy of mixing 
decreases as the shape parameter of the Pareto distribu- 
tion increases, corresponding to a decrease in inequality 
as measured by the Lorenz ordering. 



MAXIMAL PROPERTIES OF REGULAR 
POLYGONS AND MAXIMUM ENTROPY OF 
MIXING 



where 



(a-l)/a 
Pi ~ Pi 

2 1 /" - 1 



Since a"(pi) < 0, we can apply Jensen's inequality in the 
form 



a E* Pj > £*°"(ft')' 



or, equivalently, 

2 / 2 



i=l 



(a-l)/a 



a/(a-l) 



(16) 



i=i 



The inequality in lltifl follows from the fact that power 
means are increasing functions of their order [|J since a > 



A closed form expression for the Lorenz functions also 
exists for the arcsine distribution, which lies outside the 
traditional formulation of extreme value distributions. 
The arcsine law 



F(x) 



(17) 



is a symmetric distribution where the probabilities at the 
extremes, x = 0, and x = 1, are the greatest. The central 
term has the smallest probability even though it coincides 
with the mean value, fj, — \. This goes against intuition 
which equates mean and most probable values. 
The Lorenz functions 



L(p) =p- 



sm irp 



TV 



and 



L(p) =p + 



sin(l — p)ir 



treated the tail of the power function, 

F(x) = 1 - (1 - xf , 

which generates the reverse Weibull distribution for largest value. 
This is the distribution of lengths obtained when (3 independent 
and random chosen points partition the interval, 0, 1 into (3 + 
1 intervals The entropy of mixing remains the same even 
though L(p) and L(p) have more complicated forms, viz., L(p) = 
(/3 + l)p-/3p(' 3 + 1 )/' 3 and L(p) = {(3 + l)p + /3[(l -p)^ 1 )// 3 - 1], 
where the mean, fi = (/3 + 1) , has been used to evaluate the 
expressions. Both vanish at p = 0, and are unity at p = 1. 
Their difference is precisely 1141 when properly normalized; the 
normalizing denominator being f3 (l — 2 -1 /' 3 ) . 



reflect the symmetry of the arcsine distribution. Their 
difference gives the entropy of mixing as 

S(p) = i [sin7rp + sin(l — p)ir] = sin7rp, (18) 

upon normalization, which is concave and maximal. 

The entropy of mixing, ]1HI . is easily generalized to a 
set of n probabilities, viz., 



1 - 



5 



where the set {pi} is assumed to form a complete dis- 
tribution. Since (sin #)" < and < # < 7T, Jensen's 
inequality gives 0, P- 99] 

/ n \ 1 " 

sin tt^ pi/n > - sinTrp,-, (19) 

V i=l / 71 i=l 

unless all the are equal. Now since the pi are assumed 
to form a complete distribution, H19|l reduces to 

n 

rtsin ^— J > sin 7175;. (20) 
71 i=l 

The left side of l|2U[l is half the perimeter of a regu- 
lar polygon of n sides inscribed in a circle of unit ra- 
dius. Let O be the center of the circle and Pq,P\, . . . ,P n 
the vertices of a polygon that lie on the circle, where 
Po = Pn fixed but Pi, P2, . . ■ , P n -i can vary. If the an- 
gle Pi-iOPi is identified with irp i: then (|2U[I asserts that 
both the perimeter and area of the polygon are great- 
est when the polygon is regular, viz., Pj_iPj = n for 
i = 1, 2 . . . ,n — 1. Hence, the familiar maximal proper- 
ties of regular polygons coincide with the maximum en- 
tropy of mixing which occurs when all the probabilities 
are equal, Pi-\OPi — tt/u for i = 1, 2, . . . , n — 1. 
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